AITopics | presentation slide

Collaborating Authors

presentation slide

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Do Slides Help? Multi-modal Context for Automatic Transcription of Conference Talks

Sinhamahapatra, Supriti, Niehues, Jan

arXiv.org Artificial IntelligenceOct-17-2025

State-of-the-art (SOTA) Automatic Speech Recognition (ASR) systems primarily rely on acoustic information while disregarding additional multi-modal context. However, visual information are essential in disambiguation and adaptation. While most work focus on speaker images to handle noise conditions, this work also focuses on integrating presentation slides for the use cases of scientific presentation. In a first step, we create a benchmark for multi-modal presentation including an automatic analysis of transcribing domain-specific terminology. Next, we explore methods for augmenting speech models with multi-modal information. We mitigate the lack of datasets with accompanying slides by a suitable approach of data augmentation. Finally, we train a model using the augmented dataset, resulting in a relative reduction in word error rate of approximately 34%, across all words and 35%, for domain-specific terms compared to the baseline model.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.13979

Country:

North America > United States > Minnesota (0.28)
Europe > Germany (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Index-MSR: A high-efficiency multimodal fusion framework for speech recognition

Chen, Jinming, Wang, Lu, Song, Zheshu, Deng, Wei

arXiv.org Artificial IntelligenceSep-30-2025

ABSTRACT Driven by large-scale datasets and LLM-based architectures, automatic speech recognition (ASR) systems have achieved remarkable improvements in accuracy. However, challenges persist for domain-specific terminology, and short utterances lacking semantic coherence, where recognition performance often degrades significantly. At its core is a novel Multimodal Fusion Decoder (MFD), which effectively incorporates text-related information from videos (e.g., subtitles and presentation slides) into the speech recognition. This cross-modal integration not only enhances overall ASR accuracy but also yields substantial reductions in substitution errors. Extensive evaluations on both an in-house subtitle dataset and a public A VSR dataset demonstrate that Index-MSR achieves state-of-the-art accuracy, with substitution errors reduced by 20-50%. These results demonstrate that our approach efficiently exploits text-related cues from video to improve speech recognition accuracy, showing strong potential in applications requiring strict audio-text synchronization, such as audio translation.

artificial intelligence, information, speech recognition, (15 more...)

arXiv.org Artificial Intelligence

2509.22744

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Seeing Like a Designer Without One: A Study on Unsupervised Slide Quality Assessment via Designer Cue Augmentation

Inui, Tai, Oh, Steven, Kuan, Magdeline

arXiv.org Artificial IntelligenceAug-28-2025

--We present an unsupervised slide-quality assessment pipeline that combines seven expert-inspired visual-design metrics (whitespace, colorfulness, edge density, brightness contrast, text density, color harmony, layout balance) with CLIP-ViT embeddings, using Isolation Forest-based anomaly scoring to evaluted presentation slides. Trained on 12k professional lecture slides and evaluated on six academic talks (115 slides), our method achieved Pearson correlations up to 0.83 with human visual-quality ratings--1.79 to 3.23 stronger than scores from leading vision-language models (ChatGPT o4-mini-high, Chat-GPT o3, Claude Sonnet 4, Gemini 2.5 Pro). We demonstrate convergent validity with visual ratings, discriminant validity against speaker-delivery scores, and exploratory alignment with overall impressions. Our results show that augmenting low-level design cues with multimodal embeddings closely approximates audience perceptions of slide quality, enabling scalable, objective feedback in real time. Slideware such as PowerPoint, Keynote and Google Slides has become the primary visual channel in classrooms, boardrooms and pitch competitions.

anomaly score, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.19289

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DesignLab: Designing Slides Through Iterative Detection and Correction

Yun, Jooyeol, Wang, Heng, Shimose, Yotaro, Choo, Jaegul, Takamatsu, Shingo

arXiv.org Artificial IntelligenceJul-24-2025

Designing high-quality presentation slides can be challenging for non-experts due to the complexity involved in navigating various design choices. Numerous automated tools can suggest layouts and color schemes, yet often lack the ability to refine their own output, which is a key aspect in real-world workflows. We propose DesignLab, which separates the design process into two roles, the design reviewer, who identifies design-related issues, and the design contributor who corrects them. This decomposition enables an iterative loop where the reviewer continuously detects issues and the contributor corrects them, allowing a draft to be further polished with each iteration, reaching qualities that were unattainable. We fine-tune large language models for these roles and simulate intermediate drafts by introducing controlled perturbations, enabling the design reviewer learn design errors and the contributor learn how to fix them. Our experiments show that DesignLab outperforms existing design-generation methods, including a commercial tool, by embracing the iterative nature of designing which can result in polished, professional slides.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.17202

Genre:

Research Report (1.00)
Instructional Material (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides

Zhao, Jinghua, Jia, Yuhang, Wang, Shiyao, Zhou, Jiaming, Wang, Hui, Qin, Yong

arXiv.org Artificial IntelligenceApr-22-2025

Incorporating visual modalities to assist Automatic Speech Recognition (ASR) tasks has led to significant improvements. However, existing Audio-Visual Speech Recognition (AVSR) datasets and methods typically rely solely on lip-reading information or speaking contextual video, neglecting the potential of combining these different valuable visual cues within the speaking context. In this paper, we release a multimodal Chinese AVSR dataset, Chinese-LiPS, comprising 100 hours of speech, video, and corresponding manual transcription, with the visual modality encompassing both lip-reading information and the presentation slides used by the speaker. Based on Chinese-LiPS, we develop a simple yet effective pipeline, LiPS-AVSR, which leverages both lip-reading and presentation slide information as visual modalities for AVSR tasks. Experiments show that lip-reading and presentation slide information improve ASR performance by approximately 8\% and 25\%, respectively, with a combined performance improvement of about 35\%. The dataset is available at https://kiri0824.github.io/Chinese-LiPS/

artificial intelligence, information, speech recognition, (15 more...)

arXiv.org Artificial Intelligence

2504.15066

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

PASS: Presentation Automation for Slide Generation and Speech

Aggarwal, Tushar, Bhand, Aarohi

arXiv.org Artificial IntelligenceJan-15-2025

In today's fast-paced world, effective presentations have become an essential tool for communication in both online and offline meetings. The crafting of a compelling presentation requires significant time and effort, from gathering key insights to designing slides that convey information clearly and concisely. However, despite the wealth of resources available, people often find themselves manually extracting crucial points, analyzing data, and organizing content in a way that ensures clarity and impact. Furthermore, a successful presentation goes beyond just the slides; it demands rehearsal and the ability to weave a captivating narrative to fully engage the audience. Although there has been some exploration of automating document-to-slide generation, existing research is largely centered on converting research papers. In addition, automation of the delivery of these presentations has yet to be addressed. We introduce PASS, a pipeline used to generate slides from general Word documents, going beyond just research papers, which also automates the oral delivery of the generated slides. PASS analyzes user documents to create a dynamic, engaging presentation with an AI-generated voice. Additionally, we developed an LLM-based evaluation metric to assess our pipeline across three critical dimensions of presentations: relevance, coherence, and redundancy. The data and codes are available at https://github.com/AggarwalTushar/PASS.

delivery, pipeline, présentation, (15 more...)

arXiv.org Artificial Intelligence

2501.06497

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Increasing the Accessibility of Causal Domain Knowledge via Causal Information Extraction Methods: A Case Study in the Semiconductor Manufacturing Industry

Razouk, Houssam, Benischke, Leonie, Garber, Daniel, Kern, Roman

arXiv.org Artificial IntelligenceNov-15-2024

The extraction of causal information from textual data is crucial in the industry for identifying and mitigating potential failures, enhancing process efficiency, prompting quality improvements, and addressing various operational challenges. This paper presents a study on the development of automated methods for causal information extraction from actual industrial documents in the semiconductor manufacturing industry. The study proposes two types of causal information extraction methods, single-stage sequence tagging (SST) and multi-stage sequence tagging (MST), and evaluates their performance using existing documents from a semiconductor manufacturing company, including presentation slides and FMEA (Failure Mode and Effects Analysis) documents. The study also investigates the effect of representation learning on downstream tasks. The presented case study showcases that the proposed MST methods for extracting causal information from industrial documents are suitable for practical applications, especially for semi structured documents such as FMEAs, with a 93\% F1 score. Additionally, MST achieves a 73\% F1 score on texts extracted from presentation slides. Finally, the study highlights the importance of choosing a language model that is more aligned with the domain and in-domain fine-tuning.

data mining, machine learning, relation, (20 more...)

arXiv.org Artificial Intelligence

2411.10172

Country:

Europe > Austria > Styria > Graz (0.04)
North America > Puerto Rico > Peñuelas > Peñuelas (0.04)
Europe > Greece (0.04)
(10 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Semiconductors & Electronics (1.00)
Information Technology > Security & Privacy (0.93)
Information Technology > Hardware (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Data Science > Data Mining > Text Mining (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Google is quietly building an omnipresent AI that will be linked to all your devices and apps - and 'knows everything about your life'

Daily Mail - Science & techDec-8-2023, 22:49:32 GMT

Confidential documents presented at a recent internal Google summit detail the tech giant's plan to create an artificial intelligence (AI) designed to become its users' 'Life Story Teller.' But to do it, the AI will require unprecedented access to each user's personal data. It's unclear where this experimental AI, currently dubbed'Project Ellmann,' will reside among Google's apps and services, but the team behind it works for Google Photos -- and their presentation suggested a tailored AI chatbot. 'We can't answer tough questions or tell good stories without a bird's-eye view of your life,' read one portion of the presentation, made by a Google product manager. Confidential documents presented at a recent internal Google summit detail the tech giant's plan to create an AI designed to become their users' 'Life Story Teller.' Building off the company's ChatGPT rival Gemini, it new project will scrape reams of a user's personal data Building off the company's ChatGPT rival Gemini, Project Ellmann will use'large language models' (LLMs) to synthesize personal information from context said to include biographies of users and their loved ones, as well as stored photo'moments.' But the new developments may spark alarm from those outraged by Google's secret collection of millions of individual's sensitive medical records, code-named Project Nightingale in 2019 -- or anyone who eagerly collects digital privacy tips.

google, project ellmann, présentation, (14 more...)

Daily Mail - Science & tech

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback

The 7 Reasons Most Machine Learning Funds Fail (Presentation Slides)

#artificialintelligenceApr-23-2022, 20:55:27 GMT

The rate of failure in quantitative finance is high, and particularly so in financial machine learning. The few managers who succeed amass a large amount of assets, and deliver consistently exceptional performance to their investors. However, that is a rare outcome, for reasons that will become apparent in this presentation. Over the past two decades, I have seen many faces come and go, firms started and shut down. In my experience, there are 7 critical mistakes underlying most of those failures. This paper is partly based on the book Advances in Financial Machine Learning (Wiley, 2018).

machine learning fund fail, presentation slide

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.98)

Add feedback

Extractive Research Slide Generation Using Windowed Labeling Ranking

Sefid, Athar, Wu, Jian, Mitra, Prasenjit, Giles, Lee

arXiv.org Artificial IntelligenceJun-6-2021

Presentation slides describing the content of scientific and technical papers are an efficient and effective way to present that work. However, manually generating presentation slides is labor intensive. We propose a method to automatically generate slides for scientific papers based on a corpus of 5000 paper-slide pairs compiled from conference proceedings websites. The sentence labeling module of our method is based on SummaRuNNer, a neural sequence model for extractive summarization. Instead of ranking sentences based on semantic similarities in the whole document, our algorithm measures importance and novelty of sentences by combining semantic and lexical features within a sentence window. Our method outperforms several baseline methods including SummaRuNNer by a significant margin in terms of ROUGE score.

proceedings, slide generation, summarization, (15 more...)

arXiv.org Artificial Intelligence

2106.03246

Country:

North America > United States > Pennsylvania (0.05)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

Add feedback